EDA

NaN Value Check

Missing values are present in the data - columns shown below:

1 circularity 841 non-null float64 2 distance_circularity 842 non-null float64 3 radius_ratio 840 non-null float64 4 pr.axis_aspect_ratio 844 non-null float64 6 scatter_ratio 845 non-null float64 7 elongatedness 845 non-null float64 8 pr.axis_rectangularity 843 non-null float64 10 scaled_variance 843 non-null float64 11 scaled_variance.1 844 non-null float64 12 scaled_radius_of_gyration 844 non-null float64 13 scaled_radius_of_gyration.1 842 non-null float64 14 skewness_about 840 non-null float64 15 skewness_about.1 845 non-null float64 16 skewness_about.2 845 non-null float64

Checking for outliers

NaN value treatment

For Columns with no outliers containing null values

For Columns with outliers containing null values

Finding relationship between independent & dependent variables

elongatedness and distance_circularity | distance_circularity and scatter_ratio | scatter_ratio and elongatedness | scatter_ratio and pr.axis_rectangularity | scaled_variance and elongatedness | circularity and max.length_rectangularity | scaled_variance.1 and scaled_variance |

shows very high correlations

Removing some of the very high correlated columns- elongatedness,pr.axis_rectangularity,scaled_variance.1 can be dropped

pr.axis_aspect_ratio 0.038650 | max.length_aspect_ratio 0.036942 | skewness_about.2 -0.054890 |

shows very less correlation with the independend variable

Model

PCA with value 8 is what that capture about 95% of the variance in the data

accuracy scores and cross validation scores of Support vector machines – one trained using raw data

Accuracy = 0.9566929133858267 [[123 1 2] [ 0 64 3] [ 3 2 56]] precision recall f1-score support

       1       0.98      0.98      0.98       126
       2       0.96      0.96      0.96        67
       3       0.92      0.92      0.92        61

accuracy                           0.96       254

macro avg 0.95 0.95 0.95 254 weighted avg 0.96 0.96 0.96 254

Cross-validation score Accuracy: 97.763% (2.161%)

accuracy scores and cross validation scores of Support vector machines – one trained using Principal Components

Accuracy = 0.9448818897637795 [[121 2 3] [ 0 65 2] [ 4 3 54]] precision recall f1-score support

       1       0.97      0.96      0.96       126
       2       0.93      0.97      0.95        67
       3       0.92      0.89      0.90        61

accuracy                           0.94       254

macro avg 0.94 0.94 0.94 254 weighted avg 0.94 0.94 0.94 254

Cross-validation score Accuracy: 95.269% (3.189%)

Both the model performed comparitively same